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Many databases contain uncertain and imprecise references to real-world entities. 
The absence of identifiers for the underlying entities often results in a database 
which contains multiple references to the same entity. This can lead not only to 
data ... 
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This paper introduces Jump, a prototype computer vision-based system that 
transforms paper-based architectural documents into tangible query interfaces. 
Specifically, Jump allows a user to obtain additional information related to a given . 
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Databases of text and text-annotated data constitute a significant fraction of the 
information available in electronic form. Searching and browsing are the typical ways 
that users locate items of interest in such databases. Interfaces that use 
multifaceted ... 
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Confidence measures are a practical solution for improving the usefulness of Natural 
Language Processing applications. Confidence estimation is a generic machine 
learning approach for deriving confidence measures. We give an overview of the 
application ... 
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estimation, speech recognition, spoken language understanding, statistical machine 
learning 
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Text is ubiquitous and, not surprisingly, many important applications rely on textual 
data for a variety of tasks. As a notable example, information extraction applications 
derive structured relations from unstructured text; as another example, focused ... 
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The subject of this article is differential compression, the algorithmic task of finding 
common strings between versions of data and using them to encode one version 
compactly by describing it as a set of changes from its companion. A main goal ... 

Keywords: Delta compression, differencing, differential compression 
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Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized 
newswire stories recently made available by Reuters, Ltd. for research purposes. Use 
of this data for research on text categorization requires a detailed understanding ... 
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XML is widely recognized as the data interchange standard of tomorrow because of 
its ability to represent data from a variety of sources. Hence, XML is likely to be the 
format through which data from multiple sources is integrated. In this article, we ... 
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Multiple selections, though heavily used in file managers and drawing editors, are 
virtually nonexistent in text editing. This paper describes how multiple selections can 
automate repetitive text editing. Selection guessing infers a multiple selection ... 

Keywords: LAPIS, PBD, automated text editing, pattern matching, programming-by- 
demonstration, search- and- replace 
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In this paper, we present (a) a method for identifying documents captured from low- 
resolution devices such as web-cams, digital cameras or mobile phones and (b) a 
technique for extracting their textual content without performing OCR. The first 
method ... 
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As the gap between processor and memory speeds continues to widen, methods for 
evaluating memory system designs before they are implemented in hardware are 
becoming increasingly important. One such method, trace-driven memory 
simulation, has been the ... 

Keywords: TLBs, caches, memory management, memory simulation, trace-driven 
simulation 
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Metric space searching is an emerging technique to address the problem of efficient 
similarity searching in many applications, including multimedia databases and other 
repositories handling complex objects. Although promising, the metric space 
approach ... 

Keywords: Multimedia databases, similarity or proximity search, spatial and 
multidimensional search, spatial approximation tree 
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We investigate the problem of summarizing text documents that contain errors as a 
result of optical character recognition. Each stage in the process is tested, the error 
effects analyzed, and possible solutions suggested. Our experimental results show ... 
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CiteSeer is a scientific literature digital library and search engine which automatically 
crawls and indexes scientific documents in the fields of computer and information 
science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 ... 
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The management of electronic document collections is fundamentally different from 
the management of paper documents. The ephemeral nature of some electronic 
documents means that the document address (i.e., reference details of the 
document) can become ... 
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The vast amount of textual information available today is useless unless it can be 
effectively and efficiently searched. The goal in information retrieval is to find 
documents that are relevant to a given user query. We can represent and document 
collection ... 
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Very often in the digitization process, documents are either not placed with the 
correct orientation or are rotated of small angles in relation to the original image 
axis. These factors make more difficult the visualization of images by human users 

Keywords: monochromatic document image, orientation and skew detection 
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This article addresses the problem of processing the annotations of preexisting video 
productions to enable reuse and repurposing of metadata. We introduce the concept 
of automatic content-based editing of preexisting semantic home video metadata. 
We ... 

Keywords: Preexisting video, content, editing, metadata, representation, reuse, 
semantic 
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In this article, we propose a new postprocessing strategy, word suggestion, based on 
a multiple word trigger-pair language model for Chinese character recognizers. With 
the word suggestion strategy, Chinese character recognizers may even achieve a 
recognition ... 

Keywords: Chinese character recognizer, postprocessing 
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